Modeling, Querying, and Mining Uncertain XML Data
نویسندگان
چکیده
This chapter deals with data mining in uncertain XML data models, this uncertainty typically coming from imprecise automatic processes. We first review the literature on modeling uncertain data, starting with well-studied relational models and moving then to their semistructured counterparts. We focus on a specific probabilistic XML model, that allows representing arbitrary finite distributions of XML documents, and has been extended to also allow continuous distributions of data values. We summarize previous work on querying this uncertain data model and show how to apply the corresponding techniques to several data mining tasks, exemplified through use cases on two running examples.
منابع مشابه
On Uncertain Probabilistic Data Modeling
Uncertainty in data is caused by various reasons including data itself, data mapping, and data policy. For data itself, data are uncertain because of various reasons. For example, data from a sensor network, Internet of Things or Radio Frequency Identification is often inaccurate and uncertain because of devices or environmental factors. For data mapping, integrated data from various heterogono...
متن کاملA Probabilistic Approach to XML Data Management
Uncertainty is ubiquitous in data and can take various forms. Usually, this is not formally taken into account: only the most likely data interpretation is kept for future processing, or all probable choices of correct information above a threshold are maintained. We claim this is not sufficient. There is a need for managing the imprecision in data more rigorously, and the current thesis addres...
متن کاملSimilarity Search and Mining in Uncertain Databases
Managing, searching and mining uncertain data has achieved much attention in the database community recently due to new sensor technologies and new ways of collecting data. There is a number of challenges in terms of collecting, modelling, representing, querying, indexing and mining uncertain data. In its scope, the diversity of approaches addressing these topics is very high because the underl...
متن کاملMining tree-based association rules from XML documents
The increasing amount of XML datasets available to casual users increases the necessity of investigating techniques to extract knowledge from these data. Data mining is widely applied in the database research area in order to extract frequent correlations of values from both structured and semistructured datasets. In this work we describe an approach to mine Tree-based association rules from XM...
متن کاملAn efficient XML query pattern mining algorithm for ebXML applications in e-commerce
Providing efficient query to XML data for ebXML applications in e-commerce is crucial, as XML has become the most important technique to exchange data over the Internet. ebXML is a set of specification for companies to exchange their data in e-commerce. Following the ebXML specifications, companies have a standard method to exchange business messages, communicate data, and business rules in e-c...
متن کامل